Indefinite-Horizon POMDPs with Action-Based Termination
نویسنده
چکیده
For decision-theoretic planning problems with an indefinite horizon, plan execution terminates after a finite number of steps with probability one, but the number of steps until termination (i.e., the horizon) is uncertain and unbounded. In the traditional approach to modeling such problems, called a stochastic shortest-path problem, plan execution terminates when a particular state is reached, typically a goal state. We consider a model in which plan execution terminates when a stopping action is taken. We show that an action-based model of termination has several advantages for partially observable planning problems. It does not require a goal state to be fully observable; it does not require achievement of a goal state to be guaranteed; and it allows a proper policy to be found more easily. This framework allows many partially observable planning problems to be modeled in a more realistic way that does not require an artificial discount factor.
منابع مشابه
Achieving goals in decentralized POMDPs
Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that termi...
متن کاملImproved Planning for Infinite-Horizon Interactive POMDPs using Probabilistic Inference (Extended Abstract)
We provide the first formalization of self-interested multiagent planning using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactivePOMDP (I-POMDP) is distinct from EM formulations for POMDPs and other multiagent planning frameworks. Specific to I-POMDPs, we exploit the graphical model structure and present a new approach based on b...
متن کاملGenetic Algorithms for Approximating Solutions to POMDPs
We use genetic algorithms (GAs) to nd good nite horizon policies for POMDPs, where the search is limited to policies with a xed nite amount of policy memory. Initial results were presented in (Lusena et al. 1999) with one GA. In this paper, diierent cross-over and mutation rates are compared. Initializing the population of the genetic algorithm is done using smaller genetic algorithms. The sele...
متن کاملEfficient Planning for Factored Infinite-Horizon DEC-POMDPs
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is r...
متن کاملPolicy Filtering for Planning in Partially Observable Stochastic Domains
Partially observable Markov decision processes (POMDP) can be used as a model for planning in stochastic domains. This paper considers the problem of computing optimal policies for nite horizon POMDPs. In deciding on an action to take, an agent is not only concerned with how the action would a ect the current time point, but also its impacts on the rest of the planning horizon. In a POMDP, the ...
متن کامل